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Objective: The data provided herein represent the whole-genome resequencing data related to three wolves and 
three Iranian local dogs. The understanding of genome evolution during animal domestication is an interesting sub- 
ject in genome biology. Dog is an excellent model for understanding of domestication due to its considerable variety 
of behavioral and physical traits. The Zagros area of current day Iran has been identified as one of the initial centers of 
animal domestication. The availability of the complete genome sequences of Iranian local canids can be a valuable 
resource for researchers to address questions and testing hypotheses on the dog domestication process. 


Data description: We collected blood samples from six Iranian local canids including two hunting dogs (Saluki 
breed), a mastiff dog (Qahderijani ecotype) and three wolves. We extracted genomic DNA from blood samples. 
Sequence data were produced using the Illumina HiSeq 2500 system. All sequence data are available in the National 
Genomics Data Center (NGDC), Genome Sequence Archive (GSA) database under the accession of CRAQ01324 

and the National Center for Biotechnology Information (NCBI) under the accession of PRJNA639312. The short-read 
sequences with the mean depth of 16X were aligned to the dog reference genome (CanFam3.1) and achieved 99% 
coverage of the reference assembly. The obtained information from this experiment will be useful in evolutionary 


Objective 

Dogs (Canis familiaris) were probably the earliest 
domesticated animals and one of the human compan- 
ions in ancient times [1, 2]. Archaeological findings 
and genetic research indicated that the dog breeds have 
derived from wild wolves [3—5]. In the Southwest Asia, 
major—scale farming extended within the so-named Fer- 
tile Crescent (FC), where the independent domestication 
of plants and animals occurred [6, 7]. Extensively, cultural 
advances occurred in the Zagros area of current day Iraq 
and Iran, connecting Iranian plateau and Mesopotamia 
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[8]. Dogs had been pictured frequently in Southwest Asia 
[1, 9]. Consequently, one of the notable viewpoints on the 
primary location of the dog domestication has been the 
Southwest Asia, likely the Middle East [1]. Moreover, the 
Middle East has been included in the considerable allelic 
distribution between dog breeds and wolf [10]; however, 
this presumption has been queried because of dog-wolf 
hybridization as stated in previous studies [11-13]. The 
dog is a considerable example of phenotypic variation 
under artificial selection and demographic forces, but 
genetic basis of this diversity is not yet completely clear. 
Therefore, the availability of complete whole-genome 
resequencing data of Iranian local canids will provide 
an opportunity for researchers to trace the origin of dog 
domestication. We firstly carried out genome sequenc- 
ing of six Iranian local canids including two hunting dogs 
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(Saluki breed), a mastiff dog (Qahderijani ecotype) and 
three wolves (Table 1). We used these data for identifying 
effective genomic variants in dogs and wolves [14]. 


Data description 

We collected blood samples from three Iranian local 
dogs and three Iranian local wolves with the approval 
of the owners from six various sites in Iran. Sampling 
of Saluki dogs was done on Jamil Tavanaei’s personal 
farms in Kurdistan zone (Sanandaj and Bijar) and sam- 
pling of a Qahderijani dog was conducted on Alireza 
Hoseini private farm in Isfahan zone. One of the wolf 
samples was collected from Kerman zoological garden in 


Page 2 of 4 


Kerman zone and the others were collected from Eram 
zoological garden in Tehran zone. DNA was extracted 
with phenol/chloroform method. For sequencing library 
preparation, the genomic DNA was sheared to fragments 
of 300-500 bp, which were then end-repaired, “A’-tailed, 
and ligated to Illumina sequencing adapters. The ligated 
products with sizes of 400-500 bp were selected on 2% 
agarose gels and then amplified by LM-PCR. Illumina 
paired-end whole-genome resequencing for six individu- 
als was done with Hiseq2500 Illumina system) http:// 
www.berrygenomics.com). Both nuclear and mitochon- 
drial genomes were sequenced. We created 287.5 Gb 
data with a uniform read length of 150 bp. A total of 


Table 1 Overview of whole-genome sequence data files of six Iranian canids 


Label Name of data file/data set 


File types(extension) 


Data repository and identifier (DOI or accession number) 


Whole genome resequencing of the 
Iranian native dogs and wolves 


YPi2985_|4_1_clean.fq.gz 
YPi2985_|4_2_clean.fg.gz 
YPi2985_L5_1_clean.fg.gz 
YPi2985_L5_2_clean.fg.gz 
YPi2985_L7_1_clean.fg.gz 
YPi2985_L7_2_clean.fg.gz 
YPi2985_L8_1_clean.fg.gz 
YPi2985_L8_2_clean.fq.g 


b_L3_1_clean.fg.gz 
b_L3_2_clean.fg.gz 
b_L4_1_clean.fq.gz 
b_L4_2_clean.fq.gz 
b_L6_1_clean.fg.gz 
b_L6_2_clean.fg.gz 


Bioproject [24] No file type 


Data file 1 [25-28] FASTQ (fq.gz) 


Data file 2 [29-31] FASTQ (fq.gz) 


Data file 3 [32-34] 8-a_L5_1_clean.fg.gz 
8-a_L5_2_clean.fg.gz 
8-a_L6_1_clean.fg.gz 
| L6_2_clean.fg.gz 
| L8_1_clean.fg.gz 
| L8_2_clean.fg.gz 


Data file 4 [35-37] 7411 clean.fg.gz 
_L1_2_clean.fg.gz 
74_L5_1_clean.fg.gz 
74_L5_2_clean.fq.gz 
74_L8_1_clean.fg.gz 
74_L8_2_clean.fg.gz 


Data file 5 [38-41] 85_L5_1_clean.fg.gz 
85_L5_2_clean.fg.gz 
85_L6_1_clean.fg.gz 
85_L6_2_clean.fg.gz 
85_L7_1_clean.fg.gz 
85_L7_2_clean.fg.gz 
85_L8_1_clean.fg.gz 
85_L8_2_clean.fg.gz 


Data file 6 1 __L1_1_clean.fg.gz 
[42-45] 1_L1_2_clean.fg.gz 
1_L2_1_clean.fg.gz 
1_L2_2_clean.fg.gz 
1_L3_1_clean.fg.gz 
1_L3_2_clean.fg.gz 
1_L4_1_clean-fg.gz 
1_L4_2_clean.fg.gz 


FASTQ (fq.gz) 


FASTQ (fq.gz) 


FASTQ (fq.gz) 


FASTQ (fq.gz) 


PRJCA00118 
https://bigd.big.ac.cn/bioproject/browse/PRJCA001 183 


GDC, Genome Sequence Archive https://bigd.big.ac.cn/ 
gsa/browse/CRA001324/CRR0O42720 
GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRAOO 
GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRA0O 
GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRAOO 


GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRAOO 
GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRA0O 
GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRAOO 


GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRAOO 
DC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRA0O 
DC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRAOO 


GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRAOO 
GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRA0O 
GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRAOO 


GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRA0O 
GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRAOO 
GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRA0O 
GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRAOO 


GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRA00 
GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRAOO 
GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRA00 
GDC, Genome Sequence Archive 
https://bigd.big.ac.cn/gsa/browse/CRAOO 


324/CRRO42721 


324/CRRO42722 


324/CRRO42723 


324/CRRO42724 


324/CRRO42725 


324/CRRO42726 


324/CRRO42727 


ia 


324/CRRO042728 


a 


324/CRRO42729 


324/CRRO42730 


324/CRRO42731 


324/CRRO042732 


324/CRRO042733 


324/CRRO42734 


324/CRRO42735 


324/CRRO042736 


324/CRRO42737 


324/CRRO042738 


324/CRRO042739 


324/CRRO42740 


Amiri Ghanatsaman et al. BMC Res Notes (2020) 13:436 


1,884,054,828 short reads were generated for all of the 
six individuals. After filtering, the range of total high- 
quality sequence data was from 42.1 Gb to 51 Gb and 
the coverage varied from 14.51X to 17.15X. The range of 
the mean insert sizes and their standard deviations in the 
sequenced data for all samples was from 280.06 to 331.86 
and from 27.12 to 33.94, respectively. 

The quality assessment of raw sequence reads was done 
with FastQC  (http://www.bioinformatics.babraham. 
ac.uk/projects/fastqc/). We used BWA (v.0.7.15) [15] 
program to compare sequence data with the reference 
genome (CanFam3.1) downloaded from the Ensembl 
(http://asia.ensembl.org/Canis_lupus_familiaris/Info/ 
Index). The alignment quality was assessed with SAM- 
tools v.1.9 using flagstat and depth commands [16]. The 
short-read sequences with the mean depth of 16X were 
mapped to the dog reference genome (CanFam3.1) and 
achieved 99% coverage of the reference assembly. The 
mapping output files were preprocessed using SAM- 
tools [16], the Picard tools (http://broadinstitute.githu 
b.io/picard/) and GATK tools [17]. We used variome 
detection pipeline for this data using CNVnator [18], 
BreakDancer [19], DELLY [20] and Bedtools [21] pro- 
grams [14]. Finally, we compared the effect of variome 
between the dog and wolf genomes using Sorting Intol- 
erant from Tolerant (SIFT) algorithm [19], Ensembl 
annotation [22] and DAVID [23] tool [14]. The data pre- 
sented herein together with our previously mitochondrial 
DNA sequence on Iranian dogs [11] will provide useful 
resources to understand genetic structure of the Ira- 
nian dogs and testing hypotheses on the dog origin and 
domestication issues. 


Limitations 

Sample size for the dog and wolf populations is a limi- 
tation of our work. We could create genome sequence 
data from only three wolves and three dogs. In addition, 
we produced the short-reads with a mean depth of 16X 
which is a medium depth and it might not be suitable for 
some genomic analyses. 
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